Skip to content

Conversation

@ben-schwen
Copy link
Member

@ben-schwen ben-schwen commented Oct 28, 2025

Adds arithmetic for GForce as demanded in #3815 but does not add support for blocks in j like d[, j={x<-x; .(min(x))}, by=y].

@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 99.62121% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 99.01%. Comparing base (f3b166b) to head (283ba85).

Files with missing lines Patch % Lines
R/test.data.table.R 95.23% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7401      +/-   ##
==========================================
- Coverage   99.02%   99.01%   -0.02%     
==========================================
  Files          87       87              
  Lines       16754    16843      +89     
==========================================
+ Hits        16591    16677      +86     
- Misses        163      166       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Oct 28, 2025

  • HEAD=modular_gforce slower P<0.001 for setDT improved in #5427
    Comparison Plot

Generated via commit 283ba85

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 2 minutes and 55 seconds
Installing different package versions 45 seconds
Running and plotting the test cases 5 minutes and 11 seconds

@ben-schwen ben-schwen marked this pull request as ready for review November 2, 2025 18:01
@ben-schwen
Copy link
Member Author

I'm also not sure about moving the tests to optimize.Rraw since this feels kind of wrong and not needed after introducing the new levels/optimization parameter to test.

@ben-schwen ben-schwen mentioned this pull request Nov 2, 2025
@ben-schwen
Copy link
Member Author

@MichaelChirico I'm also not 100% convinced about the new optimize.Rraw. I guess the whole idea was that we could simply run the script multiple times with different optimization levels. This need was eliminated by adding the optimize parameter to test() which somehow feels cleaner.

@MichaelChirico
Copy link
Member

@MichaelChirico I'm also not 100% convinced about the new optimize.Rraw. I guess the whole idea was that we could simply run the script multiple times with different optimization levels. This need was eliminated by adding the optimize parameter to test() which somehow feels cleaner.

I see. I still like the idea of a separate script -- the more we peel out of the behemoth tests.Rraw, the better. "eventually" it would be nice to have most tests live in purpose-made test scripts, IMO.

test(2357.2, fread(paste0("file://", f)), DT)
})

# gforce should also work with Map in j #5336
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one last idea -- what happens when the grouping column is part of the aggregation in j?

DT[, .(sum(b) - mean(a)), by=b]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the grouping column is part of the aggregation we turn off GForce since it will be in .SDall

data.table/R/data.table.R

Lines 430 to 432 in 8129198

for (ii in seq.int(from=2L, length.out=length(jsub)-1L)) {
if (!.gforce_ok(jsub[[ii]], SDenv$.SDall, envir)) {GForce = FALSE; break}
}

}

# attempts to optimize j expressions using lapply, GForce, and mean optimizations
.attempt_optimize = function(jsub, jvnames, sdvars, SDenv, verbose, i, byjoin, f__, ansvars, use.I, lhs, names_x, envir) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really like how clean this is 👍

Copy link
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About halfway done reading the implementation now. Thanks for your patience with the review! I'm really excited for this to get finished :)


# Optimize expressions using GForce (C-level optimizations)
# This function replaces functions like mean() with gmean() for fast C implementations
.optimize_gforce = function(jsub, SDenv, verbose, i, byjoin, f__, ansvars, use.I, lhs, names_x, envir) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing that comes to mind seeing such a long signature -- using a "struct" instead of passing individual arguments, e.g.

https://stackoverflow.com/questions/31864162/what-are-the-pros-and-cons-of-using-a-struct-argument-v-s-multiple-parameters

There may be some possibility to make the code easier to understand if some arguments are grouped or combined.

Not a requirement but something to ponder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think if I we would use structs/lists then we should probably use them for all helpers here, no (for consistency?), e.g. also .optimize_sd_subset, .optimize_c_expr, .optimize_lapply, .optimize_gforce, .optimize_mean and .attempt_optimize.

For .optimize_gforce I can even see the benefit for the long signature but on the other side we run into the problem that arguments might get lost in there...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants